Deepseek V3.1 native tool calling support (OpenAI Style) by createthis · Pull Request #15533 · ggml-org/llama.cpp

createthis · 2025-08-23T21:22:33Z

This PR enables DeepSeek V3.1 thinking mode as the default. Disable with --reasoning-budget 0.

It also implements tool calling support.

Addresses #15496

My understanding is that this is a continuation of #9639 for DeepSeek V3.1 specifically.

- Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag.

This reverts commit c50d887.

chat parser.

…budget 0`.

common/chat.cpp

tests/.gitignore

tests/test-chat-parser.cpp

…ll variants.

Co-authored-by: Sigbjørn Skjæret <[email protected]>

createthis · 2025-09-05T01:27:14Z

@CISC who approves the workflows?

ggerganov · 2025-09-05T07:57:37Z

@CISC who approves the workflows?

I think all collaborators can approve them?

Just approved this one.

createthis · 2025-09-05T11:04:58Z

I still don’t see a merge button. Do we need @ngxson to review too?

pwilkin · 2025-09-05T12:14:33Z

@createthis nope, only collaborators with write access can merge, so you need either @CISC or @ggerganov to merge it :>

tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content.

createthis · 2025-09-06T04:50:17Z

I added an edge case where thinking is forced open, there is tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content, because the model appears to be confused.

createthis · 2025-09-06T15:26:25Z

@CISC @ggerganov Let me know if you want any more changes, otherwise please merge. This is working well on my end.

createthis · 2025-09-08T01:14:18Z

@CISC @ggerganov I simplified update_cursor with 26b02fa. I also performed an extensive analysis of the cause. You can view my notes here: https://gist.github.com/createthis/dc3098c3abb4ff809d0291c91322f512

TL;DR: After the secondfunction_regex call fails to find a match, it also fails to reset builder.pos(), so block_close fails. update_cursor just resets the cursor to the last position before function_regex fails.

common/chat.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

createthis · 2025-09-08T15:48:35Z

🎉🎉🎉

fernandaspets · 2025-10-14T06:27:25Z

is there a way to turn on and off this? i'm working on a tool calling project. and would like to have just the models original output for testing

createthis · 2025-10-14T12:58:06Z

is there a way to turn on and off this? i'm working on a tool calling project. and would like to have just the models original output for testing

@fernandaspets Hey neolithic. You can see the model's original output by starting llama.cpp with --verbose. It will show up in the log, as well as come through in the json as an extra property. You may be able to turn it off entirely with --reasoning-format=none. I know that turns off parsing of the think tags, but I don't remember if it turns off tool call parsing.

There is an open source CLI tool called mitmproxy: https://www.mitmproxy.org/ You can run it in reverse proxy mode as a man-in-the-middle and it will record your entire chat session, both the request and the response. This, combined with --verbose is my favorite method for debugging tool calling issues.

Once I have identified an issue, then I usually write a unit test. You can see my unit test for multiple tool calls here:

llama.cpp/tests/test-chat-parser.cpp

Line 303 in 7ea15bb

    
           const std::string in = "CONTENT<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>get_time<｜tool▁sep｜>{\"city\": \"Paris\"}<｜tool▁call▁end｜><｜tool▁call▁begin｜>get_weather<｜tool▁sep｜>{\"city\": \"Paris\"}<｜tool▁call▁end｜><｜tool▁calls▁end｜>";

…) (#15533) * Add DeepSeek V3.1 thinking mode support - Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag. * Another attempt by V3.1 non-thinking * Fix test, but it's not asserting anything. * Ignore vim swap files in tests dir * Update the test * Try using try_find_literal instead of regex * passing test * Revert "Try using try_find_literal instead of regex" This reverts commit c50d887ec2780dd9e6b8b397e92347d3db8d5575. * Remove unnecessary change * Remove comment * Add code to handle non-thinking mode. * Try to set message['prefix'] when thinking is enabled. * This fixes reasoning, but breaks normal content. We need state in the chat parser. * DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-budget 0`. * Simplify (DeepSeek V3.1 reasoning) * Fix sign inversion bug * Add some tool calling code (not working). * Tool calls working in non-reasoning mode. * Attempt a unit test for tool call parsing. * Passing test * Add tests for both happy path and broken fenced DeepSeek V3.1 tool call variants. * Passing DeepSeek V3.1 tool call tests, but model is not working. * Revert assistance response prefill change. Not my monkeys. * Add fenced_thinking unit test variant. Passes, but thinking tool calling still isn't working for some reason. * Tests pass in reasoning mode. Also e2e tool test passes. * Make a copy of the parse_json_tool_calls function for deepseek-v3.1 so as to not accidentally introduce regressions. * Fix thinking_forced_open logic. tool calling broken. Need to add another test case. * That's what I get for cargo culting a newline. * Add multi tool call test for deepseek v3.1 non-reasoning * Move test, remove .gitignore change * Place deepseek-v3.1 reasoning test directly into existing reasoning function per CISC's request. * Address whitespace CI failure. * Merge two assert_equals per CISC's request. * Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request. * Merge deepseek V3.1 and regular parse_json_tool_calls() function behaviors by adding optional update_cursor argument. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * DeepSeek V3.1 fix reasoning_format none * Strip grammar down to strictly what we expect based on model card. Throw out parts we cargo culted from R1 that don't make sense. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * DeepSeek V3.1 - Add edge case where thinking is forced open, there is tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content. * DeepSeek V3.1 - simplify update_cursor * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Fix indent --------- Co-authored-by: openhands <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

openhands-agent and others added 14 commits August 22, 2025 13:31

Another attempt by V3.1 non-thinking

3912fd3

Fix test, but it's not asserting anything.

bac6c99

Ignore vim swap files in tests dir

fe86282

Update the test

3d00d62

Try using try_find_literal instead of regex

c50d887

passing test

3f319aa

Revert "Try using try_find_literal instead of regex"

79f7ca3

This reverts commit c50d887.

Remove unnecessary change

0d959ba

Remove comment

6223c1c

Add code to handle non-thinking mode.

0d372f4

Try to set message['prefix'] when thinking is enabled.

f0da116

This fixes reasoning, but breaks normal content. We need state in the

56f7e38

chat parser.

DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-…

f4f0ddb

…budget 0`.

github-actions bot added the testing Everything test related label Aug 23, 2025

createthis mentioned this pull request Aug 23, 2025

Feature Request: Add DeepSeek-V3.1 Native Tool Calling Support (OpenAI Style) #15496

Closed

4 tasks

Simplify (DeepSeek V3.1 reasoning)

f7d2ee9

CISC reviewed Aug 24, 2025

View reviewed changes

common/chat.cpp Show resolved Hide resolved

tests/.gitignore Outdated Show resolved Hide resolved

tests/test-chat-parser.cpp Outdated Show resolved Hide resolved

Fix sign inversion bug

7ac92ca

createthis requested a review from ngxson as a code owner August 25, 2025 01:50

github-actions bot added examples server labels Aug 25, 2025

createthis marked this pull request as draft August 25, 2025 03:34

createthis added 2 commits August 25, 2025 04:43

Add some tool calling code (not working).

be0b2b8

Tool calls working in non-reasoning mode.

776d95b

createthis marked this pull request as ready for review August 25, 2025 05:42

createthis added 4 commits August 25, 2025 10:25

Attempt a unit test for tool call parsing.

a32cad1

Passing test

52d5488

Add tests for both happy path and broken fenced DeepSeek V3.1 tool ca…

a839be7

…ll variants.

Passing DeepSeek V3.1 tool call tests, but model is not working.

6ade60e

createthis and others added 2 commits September 4, 2025 18:19

Update tests/test-chat-parser.cpp

e3fe1ce

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Merge branch 'master' into deepseek_3_1_thinking_mode

25034eb

createthis marked this pull request as draft September 5, 2025 21:44

DeepSeek V3.1 - Add edge case where thinking is forced open, there is

9830e7e

tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content.

createthis marked this pull request as ready for review September 6, 2025 04:50

createthis added 2 commits September 7, 2025 16:37

Merge branch 'master' into deepseek_3_1_thinking_mode

91bc615

DeepSeek V3.1 - simplify update_cursor

26b02fa

createthis changed the title ~~Deepseek V3.1 thinking mode is the default~~ Deepseek V3.1 tool calling support Sep 8, 2025

createthis changed the title ~~Deepseek V3.1 tool calling support~~ Deepseek V3.1 native tool calling support (OpenAI Style) Sep 8, 2025

CISC approved these changes Sep 8, 2025

View reviewed changes

common/chat.cpp Outdated Show resolved Hide resolved

common/chat.cpp Outdated Show resolved Hide resolved

common/chat.cpp Outdated Show resolved Hide resolved

createthis and others added 4 commits September 8, 2025 07:11

Update common/chat.cpp

3ccc651

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update common/chat.cpp

e23eedd

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update common/chat.cpp

cf17a8e

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Fix indent

4c2179d

CISC merged commit 8802156 into ggml-org:master Sep 8, 2025
48 checks passed

firecoperana mentioned this pull request Sep 9, 2025

Deepseek V3.1 native tool calling support (OpenAI Style) ikawrakow/ik_llama.cpp#771

Merged

4 tasks

KiruyaMomochi mentioned this pull request Nov 13, 2025

Kimi-K2-Thinking native tool calling format #17251

Closed

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #17251: Kimi-K2-Thinking native tool calling format auroralabs-loci/llama.cpp#202

Open

Conversation

createthis commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

createthis commented Sep 5, 2025

Uh oh!

ggerganov commented Sep 5, 2025

Uh oh!

createthis commented Sep 5, 2025

Uh oh!

pwilkin commented Sep 5, 2025

Uh oh!

createthis commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

createthis commented Sep 6, 2025

Uh oh!

createthis commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

createthis commented Sep 8, 2025

Uh oh!

fernandaspets commented Oct 14, 2025

Uh oh!

createthis commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

createthis commented Aug 23, 2025 •

edited

Loading

createthis commented Sep 6, 2025 •

edited

Loading

createthis commented Sep 8, 2025 •

edited

Loading